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ABSTRACT 

In this paper we consider several applications of bilinear 
stochastic models in which state estimation is an important problem. 

Bilinear stochastic models occur naturally in many communication problems, 
including noisy oscillators and phase- lock loops, in which the system 
evolves on the circle S^. Similar models arise in the estimation of the 
position of an orbiting body (in which the state evolves on the 2-sphere 
S 2 ) and in the estimation of the orientation of a rotating rigid body 
(which evolves on SO(3)). The advection-dif fusion model of air pollution 
involves partial differential equations which, when discretized, include 
bilinear stochastic terms due to the random fluctuations in wind velocity 
and source rate. 

Three techniques for the solution of bilinear estimation problems 
are presented. First, finite dimensional optimal nonlinear estimators 
are presented for certain bilinear systems evolving on solvable and nilpotent 
Lie groups. Then the use of harmonic analysis for estimation problems 
evolving on spheres and other compact manifolds is investigated. Finally, 
an approximate estimation technique utilizing cumulants is discussed. 
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I . Introduction 


As is well known , the class of linear dynamical systems with white 
gaussian driving and observation noises is particularly appealing in that 
optimal estimation and control systems can be readily determined and are 
easily implemented (perhaps with the aid of a digital computer) . Un- 
fortunately, there exists no such "nice" theory for general finite- 
dimensional nonlinear systems, and until recently most nonlinear estimation 
problems were "solved" by various types of linearization and vector 
space approximation methods. 

Recently, a great deal of effort has gone into studying a class 
of nonlinear systems that possesses a great deal of structure itself — 
the class of bilinear systems* Several authors have been able to devise 
analytical techniques for such systems that are as detailed and as power- 
ful as those for linear systems. Moreover, the mathematical tools behind 
bilinear system analysis include not only many of the vector space 
techniques that are so valuable in linear system theory but also a number 
of tools drawn from the theories of Lie groups and differential geometry* 
This points out the necessity of viewing the dynamical system of interest 
in its most natural setting, rather than forcing it into the vector 
space framework. 

Both the Lie theoretic and vector space settings have proven to 
be useful in the study of bilinear estimation problems, and a number of 
important and illuminating results have been uncovered. It is the pur- 
pose of this paper to explain the practical and mathematical importance 

/ 

of these results. In Section II we view the basic mathematical formulation 
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of interest to us , and in Sections III-V we discuss several important 
practical problems that fall within this framework. These include a 
large class of synchronous coomuni cation problems, the problems of 
altitude estimation and the tracking of an orbiting vehicle, and the 
estimation of pollutant concentrations in a diffusive atmosphere subject 
to random wind effects and fluctuations in the source rate. These by 
no means cover all of the potential applications for our bilinear esti- 
mation results, but they do indicate the range of problems that can be 
considered. We refer the reader to the references for other applications. 

In Sections VI-VIII we review the techniques that have been devel- 
oped for bilinear estimation. In Section VI we describe a class of 
bilinear systems for which complete analysis is possible, and we display 
the optimal, nonlinear, finite-dimensional estimation equations for an 
example. The Lie- theoretic significance of these results is also dis- 
cussed. In Section VII the use in estimation system design of harmonic 
analysis on groups is e^lored in the context of synchronous communication 
and orbital tracking. This represents a potentially powerful tool in the 
design of high performance, implementable estimation systems. A second 
approximation method, based on the truncation of the cumulants of a 
random process, is studied in Section VIII. This approach is more closely 
related to the usual vector space techniques. 



II. Stochastic Bilinear Systems 


In this section we briefly describe the several classes of 
stochastic equations that will be considered in the remaining sections 
of this paper. The basic deterministic bilinear equation considered in 
the literature [1] - [9] is 

x (t) (2.1) 

where the A^ are given n x n matrices, the u^ are scalar inputs, and x 
is either an n- vector or an n x n matrix. As discussed in [1] , the 
additive control model 


- 


X(t) “I + 


N 

2 

i=l 


u^(t) 


A. 

1 


x(t) = B + y u. (t) B. x (t) + Cu(t) (2.2) 

L° i=l 1 1 J 

(here u is the vector of the uj can be reduced to the form (2.1) by 
state augmentation. Also, if we apply the bilinear feedback law 

u ± <t) = v ± (t) t (x(t) ) + (t) (2.3) 

where v^ and are scalars, and Z^ is a scalar-valued linear function 
of x, our system equation becomes 


x(t) 




( v i (t) ^ i (x(t) ) + U^(t)) A^] x(t) 


(2.4) 


which involves products of state variables. By including several feed- 
back paths , we can obtain essentially arbitrary polynomials in the state 
variables. 

In this paper we will consider equations such as (2.1) in which 
the Uj, are. stochastic processes. Such systems have become considered by 
several authors [9] - [27]. We refer the reader to [10] - [16] for 
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detailed discussions of the properties of such stochastic models. 

One must be careful in considering stochastic versions of (2.1) . 
For instance, if u(t) is a vector zero mean white noise with 

E[u(t) u(s)] = R(t) 6 (t-s) (2.5) 

the I to stochastic differential analog of (2.1) is 

. N N 

dx(t) = { [A + ~ y R. . (t) A.A.]dt + y A.dv.(t)} x(t) (2.6) 
° 2 i,j=l 13 13 i=l 1 1 

where v is the integral of u (i.e., it is a Brownian motion process). 

We define L = {a.}_. to be the matrix Lie algebra generated by 
l LA 

N 

{a.} — i.e. , it is the smallest subspace that contains the {a^} and 

i=o 

is closed under the comnutator product 

[t^, M 2 1 = M 1 M 2 - M 2 M x (2.7) 

One can show that in the deterministic case with x an n x n matrix, if 
x(o) is an element of the matrix Lie group 

B 6 B 

G - {exp l} = {e ^ e ^ . e m | B. e l} (2,8) 

G l 

then x (t) is an element of G (for all t >_ 0) . In order to make a similar 
statement in the stochastic case when u is a white noise, we must include 
a correction term, as indicated in (2.6). 

Another case of considerable importance arises if u is generated 
by a finite dimensional linear diffusion process 
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d K (t) = F (t) 5 (t) dt + g ( t) dw(t) + a(t) dt (2.9) 

u(t) = H (t) C (t) (2.10) 

where a, F, G, and H are known and w is a standard Brownian motion process 
(E (dw(t)dw' (t) ] = I dt) . In this case, x by itself is not a Markov pro- 
cess, but the pair (x, £) is. Augmenting the state with £, we obtain a 
stochastic equation of the form (2.4) , where the = 1 and some of the 
are white noises while the others are zero (see several examples in 
the following sections). We note that one can show [16], [17] that in 
this case no correction term need be added to (2.4). Also, the right- 
hand side of (2.4) does not satisfy the global Lipschitz conditions 
often assumed in proving the existence of solutions to I to differential 
equations [28], [29]. Again, one can shown [15], [20], [30], [31] that 
this causes no problems in the case when (2.4) arises from (2.1) driven 
by the colored noise (2.9) - (2.10). 

In the remaining sections of this paper, we will consider the 
estimation of processes described by stochastic bilinear equations of 
the types just discussed. We now briefly describe the various types 
of measurement processes that will be considered. We also refer the 
reader to the references [16] - [27] for more on these estimation pro- 
blems. 

One very important measurement process consists of linear 
measurements corrupted by additive noise 

dz(t) = L(x(t))dt + dv(t) (2.11) 

where L is a linear operator (recall X is either an n-vector or an n x n 
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matrix) and v is a Brownian motion process. A second observation model is 
the "multiplicative measurement noise” case 

Z(t) = X(t) V(t) (2.12) 


in which Z, X and V are all n x n matrices. The third and final measurement 
process is described by a bilinear dynamical equation 


dz (t) = { [A + \ 

O 2 


N 

I 


i,j=l 


R. . (t) A. A,] dt 
ID ID 


N 

+ y [l. (x(t)) dt + dv. (t)] A.} z(t) (2.13) 

■ _ <i 1 XX 


Examples of each of these processes will be given in the next few sections. 

We close this section by noting that many but not all of our 
results are motivated by considering these estimation problems in the con- 
text of Lie group theory. We refer the reader to [1] - [5], [10], [16], 
[17], and [20] - [23] for more on this subject. 



III. Estimation Problems Arising in Communication Applications 


An important problem in a large number of communications appli- 
cations [10], [17], [19], [24], [25], [32] - [37] is the processing of 
a signal of the form 

r(t) = A(t) sin (w t + <j>(t) + v(t)) + N(t) (3.1) 

c 

where w is a known carrier frequency, $ is some type of modulating 
c 

information, v is a random phase drift, A is the sinusoidal amplitude, 
possibly containing modulating information and/or noise, and N is 
additive channel noise. As discussed in [10], [17], [19], [24], [25], 
and [34] , a number of specific problems that fit into the general form 
given by (3.1) can be modeled by equations of the type described in the 
preceding section. In this section we illustrate these ideas by 
considering several specific examples. 

Example 1 : We consider a phase tracking problem of importance in radio 

navigation systems such as Omega [38] . This problem has been studied 
in [24], [25], [32] - [34] and [36]. The solution technique developed 
in [24], [25] is discussed in Section VII. 

Suppose we receive the signal 

z (t) = sin 6 (t) + r 1 ^ 2 (t) w(t) (3.2) 

where 

0(t) = w t + f (s) ds + 0 (3.3) 

C J o 

o 

and v and w are independent standard Brownian motions, q(t) ^ 0, 

r(t) >0, and w c > 0. Also 0 Q is a random initial condition independent 

of v and w. We desire to track the signal phase — i.e., we wish to 


- 7 - 



- 8 - 


estimate 9(t) mod 2iT given {z(s) I 0 ^ s <_ t}. Equation (3.2) is, of 
course, only formal, since w is white noise. The I to differential 
forms of (3.2) and (3.3) are 

d9(t) = w dt + q^ 2 (t) dv(t) , 9 (o) =9 (3.4) 

c o 


dz(t) = sin 9 (t) dt + r 1 ^ 2 (t) dw(t) (3.5) 

and we take as our optimal estimation criterion the minimization of 

E t (1 - cos (9 (t) - 0 (t) ) | z (s) , 0 £ s £ t] . 

Noting that we are essentially tracking a point on the unit 
1 2 

circle S in the plane R , we can reformulate our problem in Cartesian 
coordinates . Let 


then 


= sin 9 (t) , 


x^ = cos 6(t) 


(3.6) 


”'dx 1 (t) ' 
_dx 2 (t) _ 


”-q(t) dt/2 w c dt + *3^^ (t) dv (t) 



_-(w c dt + q 1 ^ 2 (t) dv (t)> -q(t) dt/2_ 


1 

X 

K> 

ft 

L 


(3.7) 


dz(t) = x^t) dt + r 1/2 <t) dw(t) (3.8) 

which are of the bilinear process - linear measurement type discussed in 
Section II. Note that (3.7) describes what appears to be a damped 
oscillator? however the damping terms are the correction terms required 
by i to calculus, and one can show that 
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E (x 2 (t) + x 2 (t)) = 1 


(3.9) 


(see [16], [17] for further discussion). 

In Cartesian coordinates our estimation problem is to choose 
an estimate (x^t), * 2 (t) ) on the unit circle - i.e., such that 

~ 2 2 

x^t) + x 2 (t) = 1 (3.10) 

If we use the least squares criterion 

J= 1/2 Ettx^t) - x^t)) 2 + (x 2 (t) - x 2 (t) ) 2 [z (s) , 0 < s < t] (3.11) 

subject to (3.10) ,or equivalently subject to 

x^lt) = sin 0(t), x 2 (t) = cos 0(t) (3.12) 


our criterion reduces to 

J = E[1 - cos (0 (t) - 8(t)) |z(s), 0 < s < t] (3.13) 

— i.e. (3.13) represents a contrained least squares criterion. One can 
show that 

(x L (t|t), x 2 (t|t)) 

(x x (t), ^(t)) = — rz== (3 - 14) 

Vx 2 (t|t) + x 2 (t]t) 


or 


0(t) 


tan 


-1 


x^tlt) 

x 2 (t 1 1) 


(3.15) 


where 
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x^tjt) = ElxMt) |z(s) , 0<s< t],i = 1,2 (3.16) 

Referring to Figure 3.1 we can see the geometric significance of this 
criterion. We note that one can show that 

P(t) =Vx^ (t|t) + X 2 (t|t) <_ 1 (3.17) 

and the quantity P (t) is a measure of our confidence in our estimate . 
Specifically if 0 is a normal random variable with variance y, then 
(see [16] - [18], [24], [25]) 

P = V[E( sin 0) ] 2 + [E (cos 0)] 2 = e -Y/2 (3.18) 


so y = 0 (perfect knowledge of 8) P = 1 

and y = » ( no knowledge) p = 0. 

Example 2 ; Consider the demodulation of an FM signal in the presence of 
both phase and additive channel noise. Specifically, suppose the received 
signal process is 


dz(t) = sin (w c t + g J x(s) ds + J e^ 2 (s) df(s)) dt + r 1//2 (t) dw (t) 


(3.19) 


where x, the modulating information to be recovered, arises from a linear 
diffusion process 

dx(t) = a(t) x (t)dt + q 1/2 (t) dv(t) (3.20) 


Here f, w, amd v are independent standard Brownian motion processes. 

Equations (3.19) and (3.20) can be replaced by equivalanet 
equations in a manner similar to that used in Example 1. Our state 
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Fig. 3.1 Illustrating the Geometric Interpretation of the Criterion 
£ [l - cos (0 -9 )] 
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equation becomes 



dz = x^ dt + dw(t) (3.22) 


where x^ = x and x^ = 1 (this is the type of augmentation we use to 
"bilinearize" linear systems [1]. Note the products x^^ x 3 and x 2 x 3 in 
(3.21). 

Example 3 : In this example we consider an FM problem with phrase noise 

only. This problem was considered in 116] - [19] , and we refer the reader 
to [19] for further discussions of examples of this type. Suppose we 
observe the signal 

z^(t) = sin (w c dt + J h(s) x (s)ds + J q^^(s) dw(s)) (3.23) 

o o 

where x is given by (3.20) and w, a standard Brownian motion independent 
of x, represents a random phase drift. A number of physical sources for 
such noise are discussed in [19] . We note that in standard FM systems 
involving limiter-discriminators, additive channel noise is processed in 
such a manner as to yield frequency and phase deviations (see [37], [39]). 
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Also, such models arise in the problem of tracking drifts in high 
quality oscillators used as frequency standards or high-precision clocks 
[19], [40], in the problem of recovering velocity information from Doppler- 
shifted signals [19], [41]. 

As discussed in a number of references, given the signal (3.23), 
there are methods for generating the additional signal 


/ 


z n (t) = cos (w_t + / h(s) x(s) ds + J q 1//2 (s) dw(s) (3.24) 


(see the method described in [19] in which the total phase of (3.23) is 
reconstructed with the aid of cycle counters) . Defining 


Z(t) = 


z 2 (t) z x (tn 


-z^t) z 2 <t) 


X(t) 


/ 


cos (w t + / h(s) x(s) ds) 

c 


W(t) = 


|-sin (w c t + J h(s) x(s) ds) 
o 

cos J q^ 2 (s) dw(s) 
o 

j-sin / q^ 2 (s) dw(s) 


(3.25) 


/ 


sin (w t + / h£s> x(s) ds) 

c 


cos (w^t + J h(s) x(s) ds) 

° (3.26) 

sin J q^ 2 (s) dw(s) 


cos J q 1 ^ 2 (s) dw(s) J ( 3 . 


.27) 


we find that 


dZ = 


-qdt/2 


-(w + hx) dt - q 1/2 dw 

c 


(w c + hx)dt + q^ 2 dw 


-qdt/2 


(3.28) 
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which is a bilinear observation equation. Similarly, X and V satisfy 
bilinear stochastic differential equations. Also, we have 

Z(t) = X(t) V(t) (3.29) 

— i.e. Z can be thought of as a measurement of X in multiplicative noise. 
Note that the estimation of X, instead of x, corresponds to tracking, 
rather than demodulating, the signal phase. 

Finally, we make several comments about the matrix Lie group 
on which X, V and Z evolve. This group denoted by SO (2) , is the group 
of 2 x 2 orthogonal matrices of determinant +1 — i.e., 

S0(2) = {x|x'X = I, det X = +l} (3.30) 

As developed in [16] - [19], each X £ SO (2) can be written in the form 


cos 9 


sin 6 


X = 


|-sin 0 cos 0J 

and SO (2) is isomorphic (as a Lie group, [42], 
Also SO (2) is an abelian Lie group — i.e. 


0 £ Mr, TT] (3.31) 

[43]) to the circle S . 


X Y = Y X X, YE SO (2) (3.32) 

and the multiplication (3.32) corresponds to the mod 27T addition of the 
corresponding angles in the representation (3.31). We note that a con- 
sequence of the commutativity of SO (2) is the fact that Z can be written 
in both the bilinear form (3.28) and the multiplicative form (3.29). 

Example 4 t As discussed in [17] , the real line R 1 and the circle S 1 
are essentially the only abelian Lie groups , since any abelian Lie group 



-15- 


G is isomorphic to a direct product of a number of copies of each. For 
instance, the group D of nonzero complex numbers is isomorphic to 
R* x S 1 under the map 

(r, 6) — ^ e r+10 rCR 1 , 0e[-ir,Tr) (3.33) 

2 

Consider the R process x' = (x^ x 2 ) 

dx(t) - A (t) x(t) dt + B ( t) dw (t) (3.34) 

where w is a standard two-dimensional Brownian motion, and also consider 
the D-valued signal process 

y <t) + iy.(t) 

z(t) = e 1 (3.35) 

where 

dy(t) = x(t) dt + dv(t) (3.36) 

Here v is a 2- dimensional Brownian motion process independent of x. We 
note that z satisfies a fcoraplex) bilinear stochastic differential equation 
and also that we have the multiplicative form 


z(t> 


-[/■ 


(t) 


+ iv 2 (t) 


][/ 


(x 2 (s) 


ix 2 (s)) 


ds 


(3.37) 


Thus z is both amplitude and angle modulated, and the noise is a multi- 
plicative lognormal process [19], [44], including both phase and ampli- 
tude noise. 

We note that the multiplicative lognormal noise process in (3.37) 
is an irrportant model in some optical communication problems [44] . In 
many cases, changes in the transmission medium — e.g. , turbulence in the 
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atmosphere — cause variations in the refractive index of the air. This 
disturbance can be modeled [44] as a multiplicative lognormal noise pro- 
cess as in (3.37). Also, if v^ and are zero, (3.37) is identical to 
(3.29) if we observe that the set of unit modulus complex numbers, as well 
as SO (2) , is isomorphic to S^. 



XV- Estimation of Rotational Processes in Three Dimensions 


As we have seen, many communication problems can be placed in the 
framework of estimation of processes evolving on the circle S 1 — i.e., 
processes of rotation in one dimension. In this section we formulate 
several problems of practical importance involving rotation in three 
dimensions. As we shall see in this and in later sections, these problems 
are considerably more difficult than the one dimensional problems, since 
rotations in three-space do not commute [13], [22], [27], [45], [46]. 

The problem of estimating and controlling the angular velocity 
and orientation of a rigid body has been studied by many authors [9] , 

[22], [27], [45] - [48] and is of great importance in many aerospace 
and inertial navigation applications. Such problems are by no means 
trivial, and most of the techniques that have been developed are sub- 
optimal in nature. One feature of the rigid body orientation - angular 
velocity problem that has received some attention in the past is that 
the space of possible orientations is a Lie group [22], [27], [48], and the 
combined orientation - angular velocity space is the "tangent bundle" 
of the orientation space and thus is a homogeneous space [49] . The 
framework of differential geometry and Lie theory has proven useful 
in studying rigid body rotation problems. In fact, there are Lie-theoretic 
interpretations of four of the most widely used representations of the 
attitude of a rigid body — direction cosines, Euler angles, unit 
quaternions, and Cayley-Klein parameters. 

We will consider here only the direction cosine description and 
refer the reader to [22] and [48] for discussions of the other representa- 
tions. 

- 17 - 
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The orientation of a rigid body can be specified in terras of 
the direction cosines [46] between two sets of orthogonal axes — 
one rotating with the body and one an inertial reference frame. The 
direction cosines are usually given as a 3 x 3 orthogonal matrix X of 
positive determinant — i.e. 


X'X “ I 


det X — +1 


(4.1) 


The set of all direction cosine matrices forms the matrix Lie group 
S0(3) [1], [16], [22]. If the 3-vector £(t) is the (properly coordinatized, 

[46]) angular velocity vector of the body with respect to inertial space, 
the time evolution of the orientation of the body can be described by the 
bilinear equation 


C t <t> R A )X(t) 


where X(t) £ SO (3) and the R^, given by 



(4.3) 


form a basis for so(3) , the matrix Lie algebra associated with S0(3). 

It is well known and easy to check that SO (3) is not an abelian 
Lie group and, equivalently, that so(3) is not an abelian Lie algebra 
(i.e., commutator products are not identically zero). In fact SO (3) is 
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a simple Lie group [16], [22], [50], and it is this fact that makes the 
study of dynamics on S0(3) so difficult. Recall that in the SO(2) case 
we were able to represent solutions to bilinear equations by terms of 
the form 


t 

X (t) * ejq> (R f £ (s) 
o'' 


ds) 


(4.4) 


where 



(4.5) 


is a basis for the one - dimensional Lie algebra of SO(2). Wei and Norman 
[51], [52] have shown that one can obtain similar local expressions for 
the solution to equations of the form (2.1) but that such solutions are 
global only in certain cases — i.e., if the underlying Lie algebra is 
solvable (see [10], [16], [22], and Section VI of this paper for a 

discussion of the significance of this statement) . As simple Lie algebras 
are not solvable, we obtain no such global representation here, and as 
we shall see, we must resort to suboptimal methods in the design of 
attitude estimation systems. We also note that the local Wei-Norman 
representation of the solution of (4.2) corresponds to the Euler angle 
description, which is well known to exist only locally (see [22] and 
[46], where this fact is related to the phenomenon of "gimbal-lock") . 

Suppose now that the angular velocity vector in (4.2) is stochastic. 
Specifically, we suppose E, satisfies 


1/2 

d£(t) = f(t)dt + A(t) £ (t)dt + Q ' (t) dw(t) 


(4.6) 
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where f and Q ^ 0 are known, £(0) is normally distributed, and w is a 
standard 3-dimensional Brownian motion process independent of fj (0) . Here 
f can be thought of as a vector of (known) torques acting on the body, 
and the Brownian motion term represents random disturbances (e.g. , those 
caused by noisy responses of control devices, such as reaction jets, that 
are used to implement the desired torque f, or the effect of a gravity 
gradient) . 

Note also that the angular velocity equation (4.6) that we have 
postulated is simpler than the usual nonlinear Euler equations [46], 
Equation (4.6), with suitable choice of f. A, and Q, can be viewed as 
a reasonable approximation if: (1) the rigid body is nearly spherically 
symmetric (so that the principle moments of inertia are almost equal — 
in this case the nonlinear terms in the Euler equations are small and may 
be lumped into the random disturbance term) ; or (2) we linearize Euler's 
equations about a nominal (which might be included in the f (t) term) j 
or (3) we make Q(t) large enough so that the nonlinear effects can be 
viewed as process noise. 

We now can describe two different measurement processes — 
one motivated by a strapdown navigation system, and the other by an 
inertial system in which a platform is to be kept inertially fixed. 

In a strapdown system, [46] one receives noisy information about either 
angular velocity or incremental angle changes. Assuming that the size 
of the increment is small, either type of information can be modeled 
by the I to equation 

dz (t) = C (t) £ (t)dt + S 1/2 (t) dv(t) 


(4.7) 
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where S > 0 and v is a standard Brownian motion process, independent of 
This equation is the precise analog of the usual (formal) observation 
equation (for rate gyroscopes) 

z(t) = C(t)£(t) + v(t) (4.8) 

Here v is white gyro noise. Another possible observation model is pro- 
vided by integrating gyroscopes, in which case we observe 

m(t) = C (t) J K (s)ds + N(t) (4.9) 

o 

Here N represents a gyro drift. Usually when using a model like (4.9) , 
one assumes that the drift is a correlated process. This adds no real 
difficulty to the problem, since there are simple techniques for handling 
measurements with additive colored noise [53]. In fact, the consideration 
of the process (4.9) rather than (4.8) adds no difficulty to the analysis, 
and thus we will concentrate on (4.8). 

In the usual strapdown system the information (4.7) is processed 
by a "direction cosine computer," which produces Z (t) , the solution of 


dZ(t) = 


S R.dz. (t) + — 

■^i 11 2 

1=1 i 



3 

5 S, . (t) R. R.dt 
j«l « 1 ^ 


Z(t) 


(4.10) 


Our problem is to use the information supplied by z or Z to compute "good" 
estimates of the angular velocity £ and the orientation X. Thus, if 
we take (4.7) as our basic measurement equation, we have linear observations, 
while (4.10) yields a bilinear measurement process. 

A second type of observation process is suggested by an inertial 
system equipped with a platform that is to "instrument" (i.e. remain fixed 
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with respect to) the inertial reference frame . Suppose we use the 
notation b-frame for the body- fixed frame, p-frame for the platform 

Q 

frame, and i- frame for the inertial reference frame. Letting C p (t) 

oc 

denote the direction cosine matrix of the 8-frame with respect to the 
a- frame, we have 


X(t) = C^{t) (4.11) 

Also, by noting the relative orientation of the platform and the body 
(perhaps by reading of gimbal angles [46] , we can measure 


Let 


M(t> = c£(t) 
v (t) = cP(t) 


(4.12) 

(4.13) 


Then V(t) represents platform misalignment -- a drift of the platform with 
respect to inertial space due both to drifts in the gyroscopes used to 
sense rotation of the rigid body and also to inaccuracies in the 
mechanism that rotates the platform relative to the body in order to keep 
it inertially fixed. If we model gyro drifts and the other inaccuracies 
as Brownian motion processes, a reasonable model for V is as an SO (3) 
Brownian motion, [13], [16], [22], [54]: 


dV(t) = 


^R i dv i (t) 


i=l 


+ ± 
2 


22 

i=l i=l 


S. . (t)R.R.dt 
ID l D 


V(t) 


(4.14) 
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where v is the Brownian motion process with E[dv(t)dv’ (t)] = S(t)dt. 

Using elementrary properties of direction cosine matrices [46] , we have 

M (t) = C^(t) = C*(t) Cp(t) = C*(t) [C*(t)]’ - X(t)V*(t) (4.15) 


(multiplicative observation noise) and, using (4.2) and (4.14) (plus the 
fact that R^ = -R^) , we have the following stochastic differential 
equation for Mi 



' 3 1 


3 ! 3 

dM(t) = 

V R.s. (t) 
[i=l 1 1 

M(t)dt + M(t) 

•XW t,t 2.I 1 8 ii ,t, Vj dt 

_ i=l 1,3*1 J 


(4.16) 

Again, we wish to consider the problem of estimating ? 30(3 X given the 
observation process M. 

In many inertial systems the type of information that is avail- 
able is in the form of "pulses" [46] from the gyros. If this is the case, 
it is logical to take the incremental equation (4.10) as the basic sensor 
equation. Also, for the second observation process, if we take the 
incremental gimbal angles as the quantity we observe, equation (4.16) 
becomes the basic measurement equation. 

Now consider (4.2) with the 3x3 matrix X replaced by the 3-vector 

x. Assuming that x' (0) x(0) = 1, we have that x' (t) x(t) = l,Vt — i.e., 

2 

x evolves on S , the unit sphere in 3-space. The study of random processes 


with constant "energy" is of importance in a number of fields — including 
DC to DC conversion [55] , statistical mechanics [56] , and satellite orbital 
analysis [16], [57], [58], We briefly describe a simplified version of 
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a satellite tracking problem of great practical importance. 

Consider a satellite in circular orbit about some celestial body, 
such as the moon. Because of a variety of effects including anomalies in the 
gravitational field of the body, effects of the gravitational fields of 
nearby bodies, and the effects of "solar wind", the orbit of the satellite 
is perturbed. For simplicity, we assume that the perturbations of the 
°zfrit are tangential only — i.e. the radial effects are unimportant or 
have been corrected for. In this case, a stochastic version of (4.2) 
arises. Suppose the angular velocity E, of the vehicle with respect to 
body can be written the form 

£(t) = f(t) + w(t) (4.17) 

where f is the nominal orbit ("carrier") angulary velocity and w is a white 
perturbation with 

Elw(t)w(s)J « Q(t)6(t-s) (4.18) 

The stochastic analog of (4.2) then is 

3 3 

dx(t) = {[ I f.(t)A + ^ Q. . (t) A.A.ldt 

i=l 1 i , j=l iD 13 

3 

+ y A.dw. (t)} x (t ) (4.19) 

i=l 1 

If we are then given noisy observations of the satellite position 

dz(t) = H (t) x(t)dt +. R 1/2 (t) dv(t) (4.20) 

where v is Brownian observation noise independent of x and R(t) > 0, our 
problem is to estimate x(t) * 
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We note that the assumption that the angular velocity perturbations 

are white is a simplification* For instance, the anomalies in the moon's 

gravitational field are spatially correlated and constitute a random 

field [29] , [57] , [58] , which can be estimated [58] from the observed 

data (4* 20). However, the computations involved in such an estimation 

are of such magnitude that they must be computed off-line* The simpler 

model (4*19) may lead to simple but accurate on- line tracking schemes 

(see Section VII). Also, by including perturbations with time correlation 

consistent with the period of the orbit, we may be able to make estimation 

systems based on (4*19), (4.20) "smarter" with only minor complications 

in tracking system design. We note that (4*19) , (4.20) is a direct 
2 1 

S analog of the S incoherent oscillator tracking problem described in 
Exairple 1 (incoherent orbiters?) • 

In fact, we can again use the constrained least-squares criterion: 

minimize 

E[(x(t) -3c(t))' (x(t) - S?(t)) | z(s), 0 £ s £ t] (4.21) 

subject to 

II X (t) | | 2 = x'<t) x(t) = 1 (4.22) 

It can be shown that, as in the problem, 

x(t| t) 

II x(t[t) II 


x(t) 


(4.23) 



V. Estimation of Mr Pollution 

The problem of estimating the concentration of pollutants in 
the air is a vital first step toward the goal of maintaining the air 
pollution levels within safe limits . A recently developed stochastic model 
for air pollution 163], [64] involves partial differential equations 
which, when discretized, become discrete-time bilinear equations 
(see [65] , [66] for discussions of discrete-time bilinear systems) . The 
advection-diffusion model of [63] , [64] , which is a generalization of 
the widely-used steady-state Gaussian plume model, accounts for the 
continuous fluctuation of meteorological factors by means of stochastic 
modeling. 

The transportation in air of a single- species pollutant is 
approximated by the advection-diffusion equation 

3c — 

— (x,y,z,t) = - V • V C(x,y ,z,t) + V(K V C(x,y,z,t)) 

3t c 

+ tf„<x,y,z,t) - v-V C (x,y,z ,t) + q (x,y,z,t) 
c c 

(5.1) 

where C is the pollution concentration? C is the mean of C; V and v are 

the mean and the zero-mean stochastic component of the wind velocity; 

Q and q are the mean and the zero-mean stochastic component of the 
c c 

pollution source rate; and is the eddy diffusivity. For the boundary 
conditions associated with (5.1), see [63], [64]. 

For practical implementation, the advection-diffusion equation 
(5.1) is discretized both in space and time, resulting in the finite 
dimensional system of equations 


- 26 - 
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A x(t+l) = [21 - A + E (V D ) ] x(t) + D + L Q c + T{x(t)) w(t) (5.2) 

r (x(t) ) = [G(x(t) ) L] 


w(t) 


v (t) 
q c (t) 


z(t) = H x(t) + £<t> (5.3) 

where x(t) is the vector of pollution concentrations in the cells of the 
spatial discretization, and A, L, EtV^), D, G, H, and I (= identity) are 
known constant matrices. The noise processes v(t) , q(t) , and £(t) are 
assumed to be independent, Gaussian, and white. Since T (x(t)) is linear 
in x ( t) , the system (5.2) is obviously of the discrete— time bilinear form, 
and it is driven by fluctuations in the wind velocity and source rate. 

We will not consider specific estimation schemes for the system 
(5.2) - (5.3); however, we remark that the cumulant method of Section VIII 
may provide a useful suboptimal approach to the problem. In [63] , [64] 
it is assumed that T (x(t)) is slowly varying, and a suboptimal filter, 
which employs a non-Riccati estimation algorithm using incremental 
covariance [67], [68], is designed for the resulting "linear" system. 

Finally, we refer the reader to [80] for a discussion of 
deterministic bilinear systems described by partial differential equations. 
Also, if we discretize in space only, we obtain continuous -time bilinear 
equations, and the techniques of Sections VI and VII may be applicable. 



V 1 - Finite Dimensional Optimal Nonlinear Estimators 


In this section we will consider the estimation problem for a class 
of systems evolving on nilpotent or solvable Lie groups [10], [16], [22], 

[69], The equations we will consider are motivated by the strapdown 
navigation system of System IV — 

d C (t) = F (t) i (t) dt + Q 1/2 (t) dw(t) (6.1) 

n 

X(t) = (A o + c ± (t) AJ X(t); X(o) = I (6.2) 

dz(t) *» H(t) % (t)dt + R 1/2 (t) dv(t) (6.3) 

where £ (t) is an n-vector, X (t) is a k x k matrix, z (t) is a p-vector, 
w and v are independent standard Brownian motion processes, Q > 0, 

R > 0, and £ (0) has a Gaussian density independent of w and v. 

The criterion for the optimal estimate X (t|t) will be the 
minimization of 

E ttr { (X(t) - X (t) ) ’ (X(t) - X(t) ) }|z(s) , 0 £ s < t}, 

where tr denotes trace. It is well known [28] that this minimum-variance 
estimate is given by the conditional mean 

x(t| t) = E fc [x(t) ] = E[x(t) |z(s) , 0 < S <_ t}. 

In general, the confutation of X (t|t) requires an infinite-dimensional system 

of equations, and approximations must be made for practical implementation 

(see the related comments in [28] and Sections VII and VIII) . Thus it is 

of interest to study systems for which the optimal minimum variance estimator 

is dimensional (and thus inf lementable on-line with a digital comp uter) . 

-28- 
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Throughout this section, we assume that {a^}^, t * le smallest Lie 
algebra which contains {a^, i = 0, l,...,n), is solvable; this is equivalent 

to the existance of a complex nonsingular k x k matrix P such that, for 

all elements A in {a. } . the matrix P A P _1 is in the upper triangular 

l LA 

form where = 0 for i > j end the other elements are arbitrary (see 
[69] for further details on solvable Lie groups and Lie algebras) . Thus 
we will assume that the matrices {a^} in (6.2) are in upper triangular 
form? this iitplies that X evolves on the solvable Lie group G g (k) of 
nonsingular upper triangular k x k matrices. 

First we consider the Lie subgroup G^(k) of upper triangular 
k x k matrices with a^ = 1, i = l, . . ., k; this is a nilpotent Lie group. 
The corresponding nilpotent Lie algebra L^(k) consists of the strictly, 
upper triangular k x k matrices . In this case , because the Peano -Baker 
series is finite, X (t) can be expressed in closed form in terms of a 
finite number of integrals in which the integrands are products of the 
components of fj. We can show that, for the system (6.1) — (6.3) 
evolving on G^(k) , the optimal estimate x(t|t) can be computed by a finite 
dimensional nonlinear filter. The starting point of the derivation is a 
closed-form expression for X (t) . 

For simplicity, the filter will only be derived for n = k = 3, 
but the result can also be proved by induction for higher-order systems. 
Let A = 0 and 



(6.4) 

Then {a^ A 2 , A 3 > is a basis for L^ (3). The solution of (6.2) can be 
expressed in closed form as 



x> 
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"l / ^)da J Z 2 (ojda + f <V ? 3 (a 2 )da 2 dff l 

o o J o o J 


X(t) = 


y^? 3 (a) da 


(6.5) 


Before stating the major theorem concerning the computation of 
(tjt) , we need two preliminary results. 


Lemma 6. I t Consider the linear system (6.1), (6.3), and define 
(for a _< t) 

t * 

E 15(a)] - 5(a|t) - E[£(<y) | *(8),0<»<t] (6.6) 

Then the conditional cross- covariance matrix 

P (o L , o 2 , t) = El^) - € (^ 1 1) ) (C(0 2 ) - ?(a 2 |t))"|z(s), 0 <s <t] 

(6.7) 

is nonrandom -- i.e., it is independent of {z(s), 0 < s < t). 

The proof of Lemaa 6.1 is based upon the fact that the error co- 
variance matrices for the linear Gaussian estimation and smoothing problems 
are nonrandom [71] . This lemma allows the off-line computation of 
P (<?! , 0 2 1 t) via Kwakernaak 1 s equations [72] (for Cf^ 

p (a x , a 2 , t) = Ptc^) (a 2 , 

_p(a l ) l f ' l ' , <T,a 1 ) H*(T) r'V) h(t) T (t, cr 2 )dT] P(a 2 ) 


( 6 . 8 ) 



where the Kalman filter error covariance matrix P(t) = P(t, t, t) is com- 
puted by the Riccati equation 

P (t) = F (t) P<t) + P(t) F' (t) + Q(t) - P(t)H'(t) R 1 (t) H (t) P(t) 

( 6 . 10 ) 

Lemma 6.2 : The conditional cross-covariance satisfies 

p(c, t, t) = K(t, a) p(t) (6.11) 

where 

- K’ (t,a) — - [F* (t) + P^V) Q (t) ] K' (t , a); K'(a, <T) = I (6.12) 

dt 

This lemma follows easily from some identities in [73], Equation 
(6.12) implies that K* (t, 0) is a transition matrix satisfying the semi- 
group property 

K' (t, 0) = K' (t, s) K* (s, a) (6.13) 

These properties of P(a, t, t) are crucial in the derivation of the 
next theorem, which states the major result concerning the computation 
of X(tjt). 

Theorem 6.1 ; Consider the system of equations (6.1) - (6.4). 

The conditional expectation X(t|t) is generated by the following finite- 
dimensional nonlinear filter. First, augment the state of the linear 
equation (6.1) by writing 
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d£(t) 

F (t) 0 

0 

0 

£(t) 

dn(t) 

I 0 

0 

0 

ri(t) 

da(t) 

J K^t^Je^da 0 

O 

-IF' (t) + P^ItjQtt)] 0 

a(t) 

dB (t) 

0 e l e 3 

0 

- [f (t) + p^ajQtt)! 

B(t) 


+ 

— — 

Q 1/2 (t, 

0 

0 

0 

dw(t) (6.14) 


dt 


n(0) = a(0) = 6(0) - 0 

dz (t) = [H(t) o 0 0] 


dt + R 1/2 (t) dv(t) 


(6.15) 

(6.16) 


?(t) 

n(t) 

a(t) 

B(t) 

where K(t, a) is given by (6.12), and P(t) is given by (6.10). Here 

til til 

K^(t, CT) denotes the j — row of K(t, (?) and e^ is the i— unit vector in 

R 3 . 

Then the Kalman filter [28] for (6.14) - (6.16) yields the 
conditional ejq>ectations £(t|t) , f)(t|t), a(t|t), and B(t|t). The con- 
ditional ejqpectation X(t|t) is ccmputed by 


X(t|t) = 


fi 1 (tj t) ri 2 (t|t) + y(tlt) 


1 

0 


n 3 (t|t) 


(6.17) 
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where 

A /N t 

dy(t| t) = n 3 (t|t) + J P 13 (t, a, t) da] dt 

o' 


+ [a'(t|t) + §' (t| t) ] P(t) H* (t) R 1 (t) [dz(t) - H(t) c (t| t)dt] 
y(0|0) = 0 (6.18) 


Proof : All the terms in (6.17) result from linear filtering theory 

[28] , except for In the derivation of (6.18) we will frequently 

use a version of Fubinis theorem [74] which allows us to interchange 
integrals with conditional expectations. Notice that 

r t O 

Y(t|t) = E: [ J J ■ £ «x > ? 3 (a 2 ) da 2 (6.19) 

o o 


Thus if we define y by 

dy(t) 


t? L (t) 



? 3 <a) da] dt 


= C x (t) n 3 (t) dt 


( 6 . 20 ) 


Y (0) = 0 

then Kushner's equation [28] yields the conditional ejg?ectation 


dy(t|t) = E t [5 1 (t) n 3 (t)] dt 

+ {E t [Y(t)C t (t)] - Y(t|t) £(t| t) } H*(t)R _1 (t) [dz (t) - H(t) E, (t|t) dt) 


From the definition of P(a^, , t) it follows that 


( 6 . 21 ) 
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E t I€ 1 (t) n 3 (t)l = q(t|t) ti 3 Ct J t) + J' p 13 (t r a, t) da 


( 6 . 22 ) 


Considering the second term in (6.21), 


E t [ y(t) q(t>] - Y(t|t) ^(tlt) 

= f f 1 {E t [q(a 1 ) q(c t 2 ) qU)J- q(t|t) E t tq(a 1 ) qu 2 )]> da^ 

o o 

-// {p u (a 1 , t, t) E t (5 3 (a 2 >] + P 3i (<^ 2 / t , t) E t [^ 1 (a 1 )]} da 2 da 1 

o o 

(6.23) 

where the last term in (6.23) is a result of the definition of P(a^, 0 2 , t) 
and the e:q?ansion of the third order moment of a Gaussian distribution 
[75]. 


Thus by Lemmas 6.1 and 6.2 
E t [Y(t)5'U)] - YU|t) £’<t|t> 


■ft {q(t, o x ) E t [? 3 (a 2 )] + K 3 (t, a 2 ) E t [q(a 1 >]} p(t) da^ 
= E t l J q(t, a x ) n 3 (a x ) + J W ^ K 3 (t ' a 2 ) da 2 da l J p(t) 
= <E fc IB’ (t) 1 + E t [o , (t)l) P(t) 


(6.24) 

(6.25) 


The fact that equation (6., 14) for a and (3 provides a realization of the 
argument in (6.24) is a direct consequence of (6*12) 

Further insight into the structure of the optimal nonlinear filter 
of Theorem 6*1 is provided by the equivalent formulation 
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dX(t| t) = 

3 

HI 

A i i i (t|tj) 

x(t|t) + e 13 

c J p 13 ^ t » t} d(J * dt 



i"l 






P 


- 

\ 



0 

K x (t, a) 

K 2 (t, a) 



t 




rs ss. 1 


/ 

0 

0 

K 3 (t, a) 

da + (a-(t|t) + 3‘(t|t))E 13 


J 

o 







0 

0 

0 




— 


- 

/ 


•P (t)H' (t) R 1 (t)[dz(t) - H(t) £(t|t)dt] (6.26) 

where E. . has a 1 as its (i, j) th element and zeros elsewhere. Thus the 
iD 

filter for X(t|t) contains a model of the original system (6.2) driven 
by the innovations process dv(t) = dz(t) - H(t) £ (t|t) and the outputs 
of a linear filter (see the block diagram in Figure 6.1). 

In addition to providing the optimal minimum variance estimate, 
it can be shown that the filter of Theorem 6.1 contains fewer states 
than the extended Kalman filter [28] for the same system. This. is due to 
the on-line computation of the "approximate" error covariance matrix in the 
extended Kalman filter. 

Theorem 6.1 can be extended in various ways. First, it can be 
extended to systems of the form (6.1), (6.3), and 



a 3 (t) 

?l(t) 

c 2 ( fc ) 



Y(t) = 

0 

a 2 (t) 

e 3 (t> 

Y(t) } Y(0) = I 

(6.27) 


. 0 

0 

a 3 (t) i . 




with nonrandom diagonal terms a^ ; such systems evolve on the solvable 
Lie group G g (3). Second, the results are valid for systems evolving on 
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G,(k) and for systems with nonrandom diagonal terms evolving on G (k) . 

N s 

Finally, the results may be extended to observations of the form of (2.12) 
or (2.13)? in these cases, it can be shown [20], [26] that the observations 
are, in a certain sense, "conditionally" linear. For all of these ex- 
tensions of Theorem 6.1, it can be shown that the optimal minimum 


variance filter is finite dimensioned. 



VII. The Use of Harmonic Analysis in Suboptimal Filter Design 


A very important result of harmonic analysis states that the 

eigenfunctions of the Laplace— Bet rami operator on a compact manifold 

2 

M are a complete set of functions in L (M, y) (where y = Haar measure) 

[43] , [76] . In this section we will develop suboptimal estimation 
techniques for bilinear systems evolving on compact manifolds by 
employing an approximation to the conditional density which is based on 
these eigenfunctions. 

First consider Example 1 of Section III which evolves on the Lie 
group S 1 , described by (3.4) - (3.5) or (3.7) - (3.8). As discussed in 
[24], the optimal (constrained least-squares) filter is described as 
follows. The conditional probability density of 0 given {z(s) , 0 s t} 
may be expanded in the Fourier series (notice that the trigonometric 
polynomials are eigenfunctions of the Laplacian on S^) 

+CO 

p(0, t) - V C (t) e in0 (7.1) 

n=- OQ n 


where 


c (t) = 
n 27T 


E [e ^ n0 ^ | z (s) , 0 s <_ t] 


= b (t) - ia (t) 
n n 


(7.2) 


Then 


dc (t) 
n 


2 

= -linw + q(t)]c (t)dt 
c 2 


dz(t)+2TTlm(c^ (t) )dt 


r (t) 


2i 


+ 2ttc (t) im (c. (t)) 
n 1 
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(7.3) 
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0{t) = tan X (a^ (t)/b^(t) ) (7.4) 

Since c = 4r and c = c* (where * denotes the complex conjugate) , 

we need only solve (7.3) for n ^ 1. The structure of the optimal filter, 

which is illustrated in Figures 7.1 and 7.2, deserves further comment 

(recall that c = b - ia ) . The filter consists of an infinite bank of 
n n n 

filters, the nth of which is essentially . a damped oscillator, with 

oscillator frequency nw c , together with nonlinear couplings to the other 

filters and to the received signal. Notice, however, that the equation 

for c is coupled only to the filters for c, , c , , and c , . This fact 
n 1' n-1 n+1 

will play an important part in our approximation. 

In order to construct a finite-dimensional suboptimal filter, 
we wish to approximate the conditional density (7.1) by a density 
determined by a finite set of parameters. Several examples of "assumed 
density" approximations for this problem are discussed in [16], [25], but 
the most useful involves the assunption that p(9, t) is a folded normal 
density (see [16], [17], [24]) with mode n(t) and "variance" y(t) : 

p(6, t) = — £ e” n Y<t)/2 e in(e ~ n(t)) = F(0, n(t), y(t» (7.5) 

2ir n=-» 

This density is related to the normal density in the following way: if x 
is a real random variable with density N(a?n,Y) / then 0 = x mod 2 tt has 
the density 

+°° 

F(ct;tl,Y) = 2 N(a + 2n7T;n,Y) (7.6) 

n=-“> 
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—i.e., we "fold" N(a;n,Y> around the circle. We note that the folded 
normal density is the solution of the standard diffusion equation on the 
circle {i.e., it is the density for Brownian motion processes) and 
is as important a density on S 1 as the normal is on R 1 . 

In this case, if c^ has been computed and if p(0, t) satisfies 
{7.5) , then can be computed (for any N) from the equation 


c 


N+l 


(27T) (H+1)2 - 1 |cj »<n+1) Ci <h+i> 


(7.7) 


Thus we can truncate the bank of filters described by (7.3) by approximating 
°N+1 by and substituting this approximation into the equation for 

C N - This was done for N=1 in [25] ; the resulting suboptimal filter 
equations are 


a. - (w b 
1 cl 


_ a 


2 a l> 


(z-27Ta^) 


1 1 3 4 4 2 

[ 2 ( 2T7 - 8lf (b l - V - 2lTa ll 


6 1 = - ( Vl + 2 b l> 


^- 27ra i> 3 2 2 

+ — [0TT a 1 b 1 (a 1 + b 1 ) ^TTa^] 


(7.8) 

(7.9) 


9 = tan’ 1 (a 1 A> 1 ) (7.10) 

In [25] , this Fourier coefficient filter (FCF) was compared to a phase- 
lock loop [32] and to the Gustafson- Speyer "state-dependent noise filter" 
(SDNF) [34], The FCF performed consistently better than the other systems, 
although the SDNF performance was quite close. 

Similar analyses and "assumed density" approximations can be 
applied to the other examples in Section III; the reader is referred to 
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[25], [28], [35], and Section VIII of this paper. 

Motivated by the success of the previous example evolving on 

s\ we will now extend these results to the system (4.19) - (4.20) 

2 

evolving on the sphere S . Following [3], [9], [10], we define the 

( p+ j~vector x Ipl consisting of the p^ 1 order moments (homogeneous 

/ 

polynomials) in ( 3 ^, x^, x^) : 

T 

rp- Pl \ /P-PrP 2 \ p 


M 



p- 


1 P 2 P 3 
X 1 x 2 X 3 f 


.1 Pi=P' Pil° 


i=l 


(7.11) 


If y satisfies the linear differential equation 


y (t) = Ay (t) 


(7.12) 


then y^ satisfies a linear differential equation 


y tpl (t) . A IpI y tpl (t> 


(7.13) 


1 

We regard this as the definition of A p . It can be shown that if x 
satisfies (4.19), then x^ satisfies 


dx Ip] (t) = 


V f.(t) A.& 1 + 2 A. [P V [P1 

i-i 1 1 i.j-1 13 1 j 


x Ip] (t) dt 


+ y a tpl x Ipl (t) dw.(t) 
i-1 i 


(7.14) 


The optimal (constrained least-squares) filter is given by the 
infinite set of coupled equations 
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dE t [x Ipl (t)] - 


^ f (t)A Ep] + ^ 2- • <t) A. [p] A. 

i=l 1 1 i,j=l 13 1 3 


ft?] 


EV pJ (t)] dt 


+ {E t Ix Ip] (t)x' (t) ] - E t [x [p3 (t)] E t [x' (t) ]} 


H'(t) R _1 (t) [dz(t) - H(t) E t Ix(t)] dt 


(7.15) 


;<« = . iMa. — (7 . 16) 

||E t [x I1] (t)] || | |x(t|t) 1 1 

The structure of this filter is quite similar to that of (7.3) - (7.4) — 

i.e., it consists of an infinite bank of filters, and the filter for x^ p3 

is coupled only to those for x and x^*^ 3 . 

The similarities with the previous example are further illuminated 

by considering the spherical harmonics [77] , [78] , which constitute an 

2 

orthonormal basis for the eigenfunctions of the Laplacian on S . We 
• 2 

introduce polar coordinates (0, <£) on S , where 0<^6<ir, 0<^<2ir, 
by defining 

= cos 0; x^ = sin 0 cos <f>; x^ = sin 0 sin (J> (7.17) 

The normalized spherical harmonics Y^(0. <f>) are defined by [77] 


,a , , ,m r I (2£+ ■1)"]^^ _ , im<(> 

t) = (-1) -jj-J Prices 6) . 


♦» - 'U <e '*> 


(7.18) 

(7.19) 


for H - 0, 1, ... and m = 0, 1 , . .., Z, where P^ in (cos 0) are the associated 
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Legendre functions. 

According to the remark at the beginning of this section , any 

2 

square-integrable function on S can be ejqjanded in a series of the form 

fie.oo - | I i^te, *) (7 - 20 > 

a, - u m= *v* 

where 

f 2v r* * 

C » = / I Y (0 '^ f(0 '^ sin 0d0d $ (7*21) 

o o' / £m 

In particular, the conditional probability density function for our 

2 

estimation problem on S can be expanded via the series 

00 £ 

p(6,<t>,t) = 2 I C £m (t) Y /» (0 '* ) (7,22) 

£=0 m«»-£ 

where 

C £m {t) = E[Y lm (0(t) ' ♦ ! z(s) ' 0 £ s 1 tl (7.23) 

The set of coefficients { c» (t) } for £ = 1, k and m = t 

is equivalent to the set of conditional moments {E^tx^ (t)]} for 
p = 1, k (see [3] for a detailed description of this equivalence). 

Thus a filter which generates {c^tt) } is equivalent to (7.15); in 
fact, it can be shown that such a filter consists of an infinite set 
of equations (similar in form to (7. 15)), and the equation for is coupled 

only to those for {C„ ; £=L~1, L, L+l; m =-£, ...,£} . The optimal estimate is 


x* (t) 


[^2C 10 (t) ( -tC^t) + cJ L (t)), KC^tt) "C 1]L (t) ] 
Si <C 2 0 (t) + 2 | C n (t)| 2 ) V2 


(7.24) 
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This filter is completely analogous to the S filter (7.3) - (7.4). 

We will again employ an "assumed density" approximation in order 
... 2 

to obtain a finite-dimensional filter. An S -analog of the folded normal 
density is obtained as the solution to the standard diffusion equation on 
S 2 : 


3p(9,<|),t) 

at 


- | 3(t) V 2 p(6,<!> f t) = o 


(7.25) 


The Green's function for (7.25) is given by the "bilinear series" [79] 


G(6,<}>,t; n,V,T) 


-I f ^(M) 4 (n.V) e -*■'*■*» J B<s)ds 


£.=0 m=-£ 


(7.26) 


This is the solution to (7.25) with initial condition equal to the 

singular distribution concentrated at (n,V). 

Thus by analogy with (7.5) , we assume that the conditional density 
2 

for the S estimation problem is of the form 


P <e, 4 >,t) = 2 I Yj m (n(t) / v(t)) e~ z (£+1) Y(t) 


(7.27) 


£=0 m=-£ 


In order words, (as defined in (7.22) - (7.23) is assumed to be 


C £m (t) = Y *« (n(t) » V(t)) e _ 2 £ (£+1,Y(t) 


£m 


(7.28) 


In this case, if and have been confuted, then 


Y(W -4 log [f (c 2 l0 (t) + 2[c u (t)| 2 )] 


(7.29) 
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cos n (t) 


c io (t) 


IC* 0 (t) + 2|C 11 (t)| 2 l 1/2 


ir 


sin f| (t) 


&\ c u (t)| 

lc 2 1Q (t) + 2 j c 11 (t) | 2 3 1/2 


(7.30) 


(7.31) 


If c n (t) 


0, then the density is independent of V(t); otherwise, 

2iv(t) 




(7.32) 


c n(t) 


Then {c , , m=-(N+l), ...» N+l) can be computed (for any N) from 

N+l,m 


'N+l ,m 


(t) = Y 


N+l ,m 


(nl«l.vwi.T (wl (N+2,T(t) 


(7.33) 


Thus we can truncate the bank of filters for {c^} by approximating 

{c ) via (7.29) - (7.33) and substituting these approximations into 
N+l 

the equations for {C VT }. 

N ,m 

The performance of this suboptimal filter will be tested by 
simulation, and the results will be presented in a future report. Other 
"assumed densities" will also be studied, including those resulting from 
degenerate diffusions or diffusions with unequal drifts around the three 
axes [ 3 ] . 

The techniques of this section can in theory be extended to any 
compact manifold by employing the eigenfunctions of the Laplace-Beltrami 
operator. For example, on the n-sphere S n these functions are the n- 
diroensional spherical harmonics [78]. Since S0(3) is isomorphic (as a Lie 
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group) to S /{+ i} (see [22]/ [43]) , suboptimal filters for some of the 
rotational estimation problems of Section IV can be constructed by 
means of spherical harmonics on s\ 



VIII. A Cumulant Method for Suboptimal Filter Design 


As we have seen, we are able to obtain finite dimensional optimal 
filters only for certain bilinear estimation problems. In the preceding 
section we described how harmonic analysis could be used to design high 
quality estimation systems for processes evolving on spheres and, more 
generally, on compact manifolds. In this section we describe a design 
technique which we call the cumulants method. This approach has been 
considered by several authors [10] , [59] , [60] and is related to 
statistical linearization techniques [60] - [62] . 

We wish to consider the estimation of x(t) given the observation 
process z, where these processes satisfy 

dx(t) = a(x(t) , t)dt + B(x(t) , t)dw(t) (8.1) 

dz(t) = H (t) x(t)dt + R 1/,2 (t) dv(t) (8.2) 

where a is an n- vector and B an nxm matrix of polynomials in the components 
of x, w and v are independent standard vector Brownian motion processes, 
and R > 0. All of the random processes considered in the preceding 
sections are of the form (8.1) (see Section II). We consider only the 
linear observation process (8.2) , but the analysis of this section can 
also be carried out for bilinear and multiplicative processes of the 
forms given by (2.12) and (2.13). 

We wish to compute the conditional moments 

I k k 

I V L ...k n {t|tl = E[x x (t)...x n n (t)|z(s), 0< s <t] (8.3) 

Recall that these quantities were of direct interest in the various 
estimation problems described earlier* For any twice continuously 
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differentiable real-valued function c|> of let 

$(t|t) = E[<f>(x(t)) | z(s), 0 _< s <_ t] (8.4) 

A 

We 'then have Kushner*s stochastic differential equation for <J> : [63] , [64]: 
d$ =' {^a* + j tr ) dt 

+ [^Hx - $Hx] ' R _1 [dz - Hx] (8-5) 

k k 

Note that if $ is of the form x^ ••• x n n * right-hand side of (8.5) 

consists solely of various conditional moments as in (8.3) (see for 
exanf>le (7.15)). This is a direct consequence of the fact that the right-hand 
sides of (8.1) and (8.2) contain only polynomial functions of x. 

A 

A major complication with these equations (when <)> is a moment 

(8.3)) is that they are all coupled together (as in (7.15)). The reason 

for this is the following : let the order of a moment iil ^ b® fhe sum 

K l* n 

of the k ± . Then, because of the polynomial nature of the various co- 
efficients, the orders of some of the terms on the right-hand side of 
(8.5) are at least one higher than the order of <fr. Consider the scalar 
example 

dx(t) - ax 2 (t)dt + Bx (t)dw(t) (8.6) 

dz(t) = x(t)dt + dv(t) (8.7) 


Then 

drt^ = cun 2 dt + [dz-m^tl (8-8) 

2 

dn^ = [2oan 3 + B m 2 3dt + [m^-n^m^] [dz-n^dt] 


(8.9) 
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dm 3 = [3can 4 + 3p 2 m 3 ]dt + [rt^-m^] [dz-n^dt] (8.10) 

From the preceding comments , it is clear that the implementation 
of these equations must involve an approximation i.e., a truncation 
of the infinite set of equations. For several reasons, the direct 
truncation method — setting to zero all moments greater than some 
given order - - can cause difficulties. First of all, there is no reason 
to expect the higher moments to be small, and in many cases (such as 
the Gaussian case) the sequence of moments is unbounded. In addition, 
if we use the fact that the moments are the coefficients of the Taylor 
series expansion of the characteristic function of x, the assumption that 
the higher moments are zero corresponds to assuming that the density for 
x is a sum of derivatives of Dirac delta functions [59], 

As suggested in [59] and [60], a more useful set of variables is the 
set of cumulants, which are the coefficients of the Taylor series expansion 
of the logarithm of the characteristic function. For the present dis- 
cussion we limit ourselves to the scalar case, although the vector case 
can be handled similarly. For a scalar variable the kth cumulant is a 
polynomial in the first k moments and, in fact, the first k moments 
and the first k cumulants contain precisely the same information. For 
example , the first 4 cumulants of the scalar process x(t) in (8.6) are 

k^tt) » m^(t) (8.11) 

k 2 (t) = m 2 (t) - m^(t) (8.12) 

k 3 (t) = n> 3 (t) - 3m 1 (t)m 2 (t) + 2m^(t) 


(8.13) 
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k 4 (t:) = m 4 (t) - 3m 2 (t) - 4m 1 (t)m 3 (t) + 12m 2 (t)m 2 (t) - 6m^ (t) (8.14) 

and we see 'that is the mean and k 2 is the variance. 

As discussed in [59] , a reasonable procedure is to set to zero 
all cumulants of order higher than some given number. Note that 
assuming k^ = 0\/i _> 3 is equivalent to assuming that x(t) is Gaussian! 

Note also that if we take k (t) =0, i ^ M, we can obtain equations for 
the corresponding higher moments of x(t). In this way, we can effectively 
truncate the infinite set of moment equations. For instance, if we 
return to the scalar example (8.6), (8.7), and if we assume k (t) = OVi > 4 
equations (8.8) and (8.9) are unchanged and (8.10) is replaced by 

dm 3 = [3an(m lf m 2 , m 3 ) + 36 2 m 3 )dt 

+ m 2 , m 3 ) - nyn^] [dz-m^dt] (8.15) 

2 2 4 

^ (m i' ** 2 ' m 3 > = 3m 2 + 4m 1 m 3 - 12 «^ 2 + 6m* (8.16) 

We note that this technique can be extended to the general vector 
problem (8.1), (8.2) with no conceptual, but some bookkeeping, difficulties. 
For instance, the cumulants approach provides an alternative to the Fourier 
series methods described in Section VII for the design of phase tracking 
and demodulation systems. An open question related to the cumulants method 
is that of performance analysis — e.g., how does performance improve as 
we retain more cumulants and do we achieve the optimal performance in the 


limit? 



IX. Conclusions 


In this paper we have discussed the practical importance and 
the mathematical analysis of several classes of bilinear estimation 
problems. We have seen that such mathematical models arise in a wide 
variety of applications , and we refer the reader to the references for 
further verification of this fact. We have also indicated how such 
estimation problems can be solved. In some cases, best explained in 
a Lie-theoretic framework, we have seen that finite-dimensional optimal 
estimation equations can be derived. For Other problems the tools of 
harmonic analysis have turned out to be most appropriate and extremely 
useful. Finally, a general but untried approximation method based on 
the truncation of the curoulants of a random process has been described. 
In conclusion, the class of stochastic bilinear systems is not only an 
appealing class of systems from an applications point of view , but it 
also is a highly structured class of systems for which analysis nearly 
as detailed and successful as that for linear systems is possible. 
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